Hacker Newsnew | past | comments | ask | show | jobs | submit | bananamansion's commentslogin

did you test this in a production environment?

preparing to and found a pilot project with endboss pdf which im able to handle already.... but i focus on 100% quality in the db, so there is always a hitl below 100% confi...cant wait to proof im able to have a "hallucination free" rag..... main goal so far....next headache will be the update of data in the rag

“I see you’re dynamically generating schemas per dataset. Have you explored using a domain ontology or building a reusable ontology layer so the agent’s outputs can interoperate more consistently across projects


why would someone want this data?


are you using n8n for the workflow builder?


Nope, we built the workflow builder entirely ourselves, but I guess most workflow builders do end up looking very similar!


That strikes me as a surprising choice given the amount of prebuilt integrations with n8n


One factor in hindsight for doing this in-house was we did find out that AI can struggle with understanding and navigating existing workflow builders that were built and optimized for human usage and comprehension e.g. what nodes are available, the options that can set inside of those nodes and even how they are named had quite an impact on whether the AI could reliably form valid workflows on its own.


Not that surprising if you take n8n license into account. Its very prohibitive.


can you expand on "did a lot of a/b tests and conversion experiments" any posts i can go to read more about?


do your higher ups notice you produce less bugs? is this something you communicate to others or let your actions speak themselves?

thanks :)


We have an internal bug reporting process where internal developers talk about why each others code isn't working for them.

Perhaps I need to add a function to accommodate a specific item we didn't anticipate. This isn't a bug.

However if I have a function that does x,y,z and it turns out it can't do z or it doesn't do y as it should then we consider this a bug. We count these numbers and we ask each other to submit something that states their logic and design process. We also consider inadequate design or lack of a process a bug as well. Poor planning.

We also track time on how long it takes us to fix these bugs and implement the original design. We also ask each other for estimates on how long it would them to implement and we build a schedule on an average of these estimates.

I just ran the query for last year. I had 1 bug each quarter. One of the developers has 178. The last hire has almost 400.

I implemented a plan to reduce draw call overhead. It took me 3 months. That was almost double what other developers estimated it would take them. So far just one issue has been reported and it was more an error using to0 few words in the code comments but I added an `assert` just to warn the developer to look at this and I also fixed the code comments and provided an example.

Edit: Yes, higher ups do and look at this quarterly and I'm not sure what they do with the info to be honest.


no it wasnt my first job, 2 years experience.. but i feel like every time i start a job the first 2 months i get this feedback over and over. but a few months later noone has any negative feedback, what do people expect from peoples first month? feels like they expect full competency within a certain timeframe (less than a month). i have trouble finding other people expectations so dont know how to meet what i dont know


Don't worry about it then. Some people are faster at starting than others. Just do your best and don't worry about it until this happens again.


when i left they asked my friend to start working weekends, so maybe


That indicates leadership which is overstressed and has a poor understanding of the nature of software engineering productivity.


So this theory could make sense. Some practical advice: don't take it too hard on yourself. If someone says you're slow without backing this claim with data and/or comparisons, you're slow compared to what? Taking into account the startup context, I'd be hardly surprised if they were just trying to make you overwork.


thanks for the recommendation


dont you then need to run the experiment for a long long time to get the significance of it? + the site needs enough users viewing the page to run it in any reasonable amount of time.. i would think most sites dont get enough traffic to run a/a let alone ab?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: